W13. Minimum Spanning Trees, Disjoint Sets Union, and Kruskal’s Algorithm

Author

Nikolai Kudasov

Published

April 14, 2026

1. Summary

1.1 Greedy Algorithms

1.1.1 Optimization Problems

Many important algorithmic problems are optimization problems: there is a large set of feasible solutions, each with an associated cost or value, and the goal is to find one with minimum or maximum objective value. Such problems are recognisable by formulations using words like “shortest”, “minimum”, “maximum”, and “cheapest”.

This differs from a decision problem, where we only ask whether some feasible solution exists. For example, asking whether there is a path from \(u\) to \(v\) is a decision problem, while asking for the shortest path is an optimization problem. The distinction matters because the optimization version is often much harder: brute force usually requires examining exponentially many candidates.

1.1.2 The Greedy Pattern

A greedy algorithm builds a solution incrementally and commits to a locally best-looking choice at each step. Greedy methods are attractive because they are usually simple and fast. However, greediness is not automatically correct: some problems admit no correct greedy strategy at all.

For minimum spanning trees, greedy algorithms are correct. The reason is structural: there is a theorem, the cut property, that identifies which local choices are always safe. Prim’s and Kruskal’s algorithms both work by repeatedly choosing such safe edges.

1.2 Minimum Spanning Trees

1.2.1 Spanning Trees and Their Weight

Let \(G = (V, E)\) be an undirected connected graph. A spanning tree of \(G\) is a subset of edges \(T \subseteq E\) such that \((V, T)\) is connected and acyclic. Equivalently, it is a tree that reaches every vertex of the graph. Any spanning tree on \(|V|\) vertices contains exactly \(|V| - 1\) edges.

If each edge has a real-valued weight, then the weight of a spanning tree is

\[w(T) = \sum_{e \in T} w(e).\]

A minimum spanning tree (MST) is a spanning tree whose total weight is as small as possible. If all edge weights are equal, then every spanning tree is automatically an MST, because all spanning trees have the same number of edges and therefore the same total weight.

If the graph is disconnected, a spanning tree does not exist. The right notion is then a spanning forest: a union of spanning trees, one for each connected component. A minimum spanning forest minimises the total weight over all such forests.

The historical origin of the MST problem is often traced to Otakar Borůvka’s 1926 work on designing a low-cost electrical network in Moravia: connect all required locations while using as little total cable length as possible.

1.2.2 Example: Finding an MST

Consider the following weighted undirected graph on six vertices:

One MST consists of edges \(A\)–\(D\) (4), \(C\)–\(E\) (5), \(A\)–\(B\) (6), \(C\)–\(F\) (6), and \(B\)–\(E\) (7). Its total weight is

\[4 + 5 + 6 + 6 + 7 = \mathbf{28}.\]

1.2.3 The Cut Property

The key theorem behind MST algorithms is the cut property. A cut \((S, V \setminus S)\) partitions the vertex set into two non-empty parts. A cut respects a set of already chosen edges \(A\) if no edge in \(A\) crosses from \(S\) to \(V \setminus S\). A light edge for a cut is a crossing edge of minimum weight.

Theorem (Cut Property). Let \(A \subseteq E\) be a set of edges contained in some MST. If \((S, V \setminus S)\) is a cut that respects \(A\) and \((u, v)\) is a light edge crossing that cut, then \(A \cup \{(u, v)\}\) is also contained in some MST.

This theorem says that once we have a partial solution that could still extend to an MST, the cheapest edge crossing an appropriate cut is always safe to add. Prim’s and Kruskal’s algorithms differ in how they define the cut, but both repeatedly use this same principle.

1.3 Prim’s Algorithm

1.3.1 A Naive Growing Strategy

A natural first idea is to grow a tree from a chosen root by always taking the cheapest edge from the current tree to a vertex outside the tree. This strategy is conceptually close to Prim’s algorithm, but a direct implementation often stores edges in the priority queue, not vertices. That means the queue can contain many obsolete edges whose outside endpoint has already been absorbed into the tree.

The naive method is still correct for MST construction, but it is less elegant and often less efficient to implement than the standard formulation of Prim’s algorithm.

1.3.2 Prim’s Algorithm

Prim’s algorithm grows the MST from a root vertex \(r\) while maintaining, for each non-tree vertex \(v\), the minimum known weight of an edge connecting \(v\) to the current tree. This value is stored as the vertex’s key.

At every step:

extract the non-tree vertex \(u\) with minimum key;
add \(u\) to the tree;
inspect every edge \((u, v)\) leaving \(u\);
if \(v\) is still outside the tree and \(w(u, v)\) is smaller than \(v.\text{key}\), update \(v.\text{key}\) and set \(v.\pi = u\).

Thus the priority queue stores vertices, not edges. Each vertex is present only once in the queue, and its priority is decreased whenever we discover a cheaper connection to the current tree.

1.3.3 Pseudocode

MST-PRIM(G, w, r)
1  for each vertex u ∈ G.V
2      u.key = ∞
3      u.π  = NIL
4  r.key = 0
5  Q = ∅
6  for each vertex u ∈ G.V
7      INSERT(Q, u)
8  while Q ≠ ∅
9      u = EXTRACT-MIN(Q)
10     for each vertex v in G.Adj[u]
11         if v ∈ Q and w(u, v) < v.key
12             v.π  = u
13             v.key = w(u, v)
14             DECREASE-KEY(Q, v, w(u, v))

After the algorithm terminates, the MST consists of all edges \((u.\pi, u)\) for non-root vertices \(u\).

1.3.4 Why Prim’s Algorithm is Correct

At the moment a vertex \(u\) is extracted from the priority queue, the current tree vertices form one side of a cut and all remaining vertices form the other side. The key of \(u\) is the smallest weight of any edge crossing that cut into \(u\), and since \(u\) has minimum key among all remaining vertices, the edge \((u.\pi, u)\) is a light edge for the cut. By the cut property, it is safe to add.

Therefore each iteration preserves the invariant that the chosen edges remain extendable to an MST. After \(|V|-1\) safe additions, we have a spanning tree, and since every step was safe, that spanning tree is minimum.

1.3.5 Complexity Analysis

The total running time depends on the priority-queue implementation.

Initialization of all keys and predecessors costs \(\Theta(|V|)\).
EXTRACT-MIN is called once per vertex.
DECREASE-KEY is called whenever an edge gives a better connection to a vertex still outside the tree.

With a binary min-heap:

\(|V|\) extractions cost \(\Theta(|V| \log |V|)\);
up to \(|E|\) successful key decreases cost \(\Theta(|E| \log |V|)\).

So the total running time is

\[\Theta(|E| \log |V|).\]

With a Fibonacci heap, DECREASE-KEY is \(O(1)\) amortized, so the total becomes

\[\Theta(|E| + |V| \log |V|).\]

An additional useful bound appears in worst-case traces. Let \(n = |V|\). In the worst case, the total number of successful DECREASE-KEY operations can be

\[1 + 2 + \cdots + (n - 1) = \frac{n(n-1)}{2}.\]

Why? The second extracted vertex can be improved once before it is chosen, the third can be improved twice, and in general the vertex extracted in position \(j\) can be improved up to \(j-1\) times before leaving the queue. A carefully weighted complete graph can realise this bound.

1.4 Disjoint-Set Data Structures

1.4.1 Motivation and Operations

Many applications maintain a family of pairwise disjoint sets and repeatedly ask two kinds of questions:

which set contains a given element?
how do we merge two sets into one?

The disjoint-set data structure (also called union-find or DSU) supports exactly this with three operations:

MAKE-SET(x) creates a singleton set \(\{x\}\);
FIND-SET(x) returns the representative of the set containing \(x\);
UNION(x, y) merges the sets containing \(x\) and \(y\).

The representative is simply an element chosen to identify the set. The crucial invariant is that all elements in the same set must return the same representative, so equality of representatives becomes a constant-time test for membership in the same component.

1.4.2 Linked-List Representation

In the simplest representation, each set is a linked list. The head of the list is the representative, and every node stores a pointer to that head.

MAKE-SET is \(O(1)\).
FIND-SET is \(O(1)\) because the representative pointer is stored directly.
UNION may be expensive because if one list is appended to another, the representative pointer of every node in the appended list must be updated.

In the worst case, one UNION can therefore cost \(\Theta(n)\).

1.4.3 Weighted Union (Union by Size)

To improve linked-list DSU, store the size of each list and always append the shorter list to the longer one. Then only the smaller side needs representative-pointer updates.

This yields a standard amortized bound: each element’s representative pointer is updated at most \(\lfloor \log_2 n \rfloor\) times over the entire sequence of operations. The reason is doubling: whenever an element’s pointer changes, it moves from a smaller list into a set of at least twice the previous size. That doubling cannot happen more than \(\log_2 n\) times.

As a result, linked lists with union by size support UNION in \(O(\log n)\) amortized time while keeping FIND-SET at \(O(1)\).

1.4.4 Forest Representation

A more powerful representation stores each set as a rooted tree. The root is the representative, and every node stores only a parent pointer. A root points to itself.

MAKE-SET(x): create a one-node tree with parent[x] = x.
FIND-SET(x): follow parent pointers to the root.
UNION(x, y): link one root under the other.

Without balancing, repeated unions may produce a long chain of height \(\Theta(n)\), making FIND-SET very slow.

1.4.5 Union by Rank

To keep forest trees shallow, store a rank at each root. When merging two sets:

if the ranks differ, attach the lower-rank root under the higher-rank root;
if the ranks are equal, choose one root as the new root and increment its rank by 1.

The core theorem is that a root of rank \(k\) must have at least \(2^k\) nodes in its tree. Therefore a tree on \(n\) nodes has rank at most \(\lfloor \log_2 n \rfloor\), and so its height is also \(O(\log n)\) when path compression is not used.

Hence union by rank alone gives

\[FIND\text{-}SET,\ UNION = O(\log n) \text{ worst case.}\]

1.4.6 Path Compression

Path compression speeds up FIND-SET(x) by making every node on the search path point directly to the root.

The effect is dramatic in long operation sequences: once a path has been compressed, future finds on those nodes become much cheaper. Path compression does not recompute ranks exactly; after compression, ranks remain safe upper bounds rather than exact heights.

1.4.7 Combined Complexity and Practical Augmentation

With union by rank together with path compression, the amortized complexity of both FIND-SET and UNION is

\[O(\alpha(n)),\]

where \(\alpha(n)\) is the inverse Ackermann function. This function grows so slowly that for every practical input size, \(\alpha(n) \le 4\).

An important practical idea is that DSU can be augmented with aggregate information stored only at representatives. For example, if each set must answer:

the total weight or sum of values in the component;
the number of elements in the component;
the average value of elements in the component;

then each representative can store fields such as

sum[root];
size[root].

When two sets are merged, the new root updates these in \(O(1)\) extra time:

sum[newRoot] = sum[root1] + sum[root2];
size[newRoot] = size[root1] + size[root2];

Then the average for the component containing \(x\) is

\[\frac{\mathrm{sum}[\mathrm{FIND\text{-}SET}(x)]}{\mathrm{size}[\mathrm{FIND\text{-}SET}(x)]}.\]

The asymptotic complexity does not change, because the extra bookkeeping is constant-time per merge.

1.4.8 Comparison of Main DSU Variants

Representation	`MAKE-SET`	`FIND-SET`	`UNION`
Lists (naive union)	\(\Theta(1)\)	\(\Theta(1)\)	\(\Theta(n)\) worst
Lists (union by size)	\(\Theta(1)\)	\(\Theta(1)\)	\(O(\log n)\) amort.
Forest (naive)	\(\Theta(1)\)	\(\Theta(n)\) worst	\(\Theta(n)\) worst
Forest (union by rank)	\(\Theta(1)\)	\(O(\log n)\)	\(O(\log n)\)
Forest (rank + path compression)	\(\Theta(1)\)	\(O(\alpha(n))\) amort.	\(O(\alpha(n))\) amort.

For algorithm design and implementation, the final row is almost always the best choice.

1.5 Kruskal’s Algorithm

1.5.1 Idea

Kruskal’s algorithm grows the MST in the opposite direction from Prim’s algorithm. Instead of expanding one tree from a root, it starts with \(|V|\) isolated vertices and repeatedly joins components using the lightest available edge.

The algorithm is:

create one singleton set for each vertex;
sort all edges by non-decreasing weight;
process edges in that order;
add an edge \((u, v)\) if and only if FIND-SET(u) \ne FIND-SET(v).

If the edge’s endpoints already lie in the same DSU component, adding the edge would create a cycle, so the edge is skipped.

1.5.2 Why Kruskal’s Algorithm is Correct

At any moment, the edges already accepted by Kruskal form a forest. Consider the next accepted edge \((u, v)\). Its endpoints lie in different components of that forest, so these components define a cut. Since all lighter edges were processed earlier, any lighter crossing edge would already have been added if it connected distinct components. Therefore the current edge is a light edge for that cut, and by the cut property it is safe.

Repeating this argument shows that every accepted edge preserves the possibility of extending to an MST. Once \(|V|-1\) edges have been accepted, the forest has become a spanning tree, and it must be minimum.

1.5.3 Running Time

The running time has two parts:

sorting the edges;
DSU operations during the scan.

Sorting costs

\[O(|E| \log |E|) = O(|E| \log |V|),\]

because in a simple graph \(|E| \le |V|^2\), so \(\log |E| = O(\log |V|)\).

The DSU part performs \(O(|E|)\) finds and unions, each costing \(O(\alpha(|V|))\) amortized. Therefore the total DSU work is

\[O(|E|\,\alpha(|V|)).\]

In general, sorting dominates, so the overall complexity is

\[O(|E| \log |V|).\]

If the edges are already sorted, or can be sorted in linear time, then the DSU term dominates and the bound improves to

\[O(|E|\,\alpha(|V|)).\]

Tie-breaking matters for execution traces: if equal-weight edges exist, the order of accepted edges and the exact final DSU parent structure need not be unique. However, every valid run still produces a minimum spanning tree (or forest) of the same total weight.

1.6 The Ackermann Function and Its Inverse

1.6.1 Ackermann Function

The Ackermann function is defined recursively by

\[A_k(j) = \begin{cases} j + 1 & \text{if } k = 0, \\ A_{k-1}^{(j+1)}(j) & \text{if } k > 0, \end{cases}\]

where \(A_k^{(i)}(j)\) means applying \(A_k\) exactly \(i\) times, starting from \(j\):

\[A_k^{(i)}(j) = \underbrace{A_k(A_k(\cdots(A_k(j))\cdots))}_{\text{exactly } i \text{ applications}}.\]

This family grows extraordinarily quickly:

\[A_0(1)=2,\qquad A_1(1)=3,\qquad A_2(1)=7,\qquad A_3(1)=2047,\]

and \(A_4(1)\) is already an unimaginably large tower-exponential number.

1.6.2 Inverse Ackermann Function

The inverse Ackermann function is

\[\alpha(n) = \min \{k \mid A_k(1) \ge n\}.\]

Using the values above:

\[\alpha(n) = \begin{cases} 0 & 0 \le n \le 2, \\ 1 & n = 3, \\ 2 & 4 \le n \le 7, \\ 3 & 8 \le n \le 2047, \\ 4 & 2048 \le n \le A_4(1). \end{cases}\]

Because \(A_4(1)\) is astronomically large, we effectively have \(\alpha(n) \le 4\) for all practical values of \(n\). That is why DSU operations with rank and path compression are often treated as “almost constant time” in practice.

2. Definitions

Optimization problem: a problem in which the goal is to choose a feasible solution of minimum or maximum value.
Greedy algorithm: an algorithm that builds a solution step by step by making a locally optimal-looking choice at each stage.
Spanning tree: for a connected undirected graph \(G = (V, E)\), a connected acyclic edge set \(T \subseteq E\) containing all vertices.
Minimum spanning tree (MST): a spanning tree whose total edge weight is minimum among all spanning trees of the graph.
Spanning forest: a maximal acyclic subgraph of a disconnected graph; equivalently, a union of spanning trees, one per connected component.
Weight of a spanning tree: the sum \(w(T) = \sum_{e \in T} w(e)\) of the weights of its edges.
Cut \((S, V \setminus S)\): a partition of the vertex set into two non-empty disjoint subsets.
Light edge: a minimum-weight edge crossing a given cut.
Cut property: the theorem stating that a light edge crossing a cut that respects the current partial solution is safe to add to some MST.
Prim’s algorithm: a greedy MST algorithm that grows one tree from a root by repeatedly extracting the non-tree vertex with smallest connection key.
Key (in Prim’s algorithm): the weight of the currently cheapest known edge connecting a non-tree vertex to the current tree.
Predecessor \(v.\pi\): the vertex that currently gives \(v\) its best known connection into the MST.
Disjoint-set data structure (DSU / union-find): a data structure that maintains a partition of elements into disjoint sets and supports MAKE-SET, FIND-SET, and UNION.
Representative: the designated element or root used to identify a DSU set.
Union by size: a linked-list DSU heuristic that appends the shorter list to the longer list.
Forest representation: a DSU representation in which each set is a rooted tree and the root is the representative.
Rank: an upper bound on subtree height stored at DSU roots and used by union by rank.
Union by rank: the heuristic that attaches the lower-rank root under the higher-rank root, incrementing rank only when the two ranks are equal.
Path compression: the optimization that makes every node on a FIND-SET path point directly to the root.
Representative metadata: aggregate information stored only at the root of a DSU set, such as component size or component sum.
Kruskal’s algorithm: a greedy MST algorithm that scans edges in non-decreasing weight order and adds an edge exactly when it connects two different DSU components.
Ackermann function \(A_k(j)\): a rapidly growing recursively defined function given by \(A_0(j)=j+1\) and \(A_k(j)=A_{k-1}^{(j+1)}(j)\) for \(k>0\).
Inverse Ackermann function \(\alpha(n)\): the minimum \(k\) such that \(A_k(1) \ge n\).

3. Formulas

MST weight: \(w(T) = \displaystyle\sum_{e \in T} w(e)\)
Prim’s algorithm with binary heap: \(\Theta(|E| \log |V|)\)
Prim’s algorithm with Fibonacci heap: \(\Theta(|E| + |V| \log |V|)\)
Worst-case successful DECREASE-KEY calls in Prim on \(n\) vertices: \(\displaystyle\sum_{i=1}^{n-1} i = \frac{n(n-1)}{2}\)
Weighted-union pointer-update bound: each element’s representative pointer changes at most \(\lfloor \log_2 n \rfloor\) times
Union-by-rank size lower bound: a root of rank \(k\) has at least \(2^k\) nodes
Union-by-rank height bound: \(\text{height} \le \lfloor \log_2 n \rfloor\)
Component average in an augmented DSU: \(\displaystyle\frac{\texttt{sum[root]}}{\texttt{size[root]}}\)
Kruskal’s algorithm, general case: \(O(|E| \log |V|)\)
Kruskal’s algorithm with linear-time sorting: \(O(|E|\,\alpha(|V|))\)
Ackermann base case: \(A_0(j) = j + 1\)
Ackermann recursion: \(A_k(j) = A_{k-1}^{(j+1)}(j)\) for \(k > 0\)
Level-1 Ackermann closed form: \(A_1(j) = 2j + 1\)
Level-2 Ackermann closed form: \(A_2(j) = 2^{j+1}(j+1) - 1\)
Inverse Ackermann function: \(\alpha(n) = \min \{ k \mid A_k(1) \ge n \}\)

4. Examples

4.1. Maximize `DECREASE-KEY` Calls in Prim’s Algorithm (Problem Set 12, Task 1)

Consider Prim’s algorithm on a connected undirected graph with source vertex \(S\).

(a) Determine the worst-case number of successful DECREASE-KEY operations on an \(n\)-vertex graph.

(b) Exhibit a concrete 5-vertex graph achieving that bound.

(c) Run Prim’s algorithm on that graph using a binary min-heap and report the heap after each step.

Click to see the solution

Key Concept: In the worst case, a vertex can receive a new smaller key once for each earlier extracted vertex, so the total number of successful updates forms a triangular sum.

Let \(n = |V|\).

(a) Worst-case count.

The source is extracted first and never receives a decrease. Now focus on the remaining vertices in the order they are eventually extracted. The second extracted vertex can have its key decreased at most once before leaving the queue, the third can have its key decreased at most twice, and in general the vertex extracted in position \(j\) can be improved at most \(j-1\) times.

Therefore the total worst-case number of successful DECREASE-KEY operations is

\[1 + 2 + \cdots + (n-1) = \frac{n(n-1)}{2}.\]

This bound is achievable by a carefully weighted complete graph in which every newly added vertex improves the key of every vertex still in the queue.

(b) A concrete 5-vertex construction.

Take the complete graph on vertices \(\{S, A, B, C, D\}\) with source \(S\) and edge weights:

\[SA=10,\ SB=20,\ SC=30,\ SD=40,\] \[AB=9,\ AC=19,\ AD=29,\ BC=8,\ BD=18,\ CD=7.\]

These weights force Prim’s algorithm to extract vertices in the order

\[S,\ A,\ B,\ C,\ D,\]

and each extracted vertex improves the key of every later vertex.

(c) Binary-heap trace.

We use a standard binary min-heap, and we list the heap as an array of (key, vertex) pairs after processing each extracted vertex.

Initial heap:

\[[(0,S),\ (\infty,A),\ (\infty,B),\ (\infty,C),\ (\infty,D)].\]

Step	Vertex added	Edge used	Heap after processing	Successful `DECREASE-KEY` calls
1	\(S\)	source	`[(10,A), (20,B), (30,C), (40,D)]`	4
2	\(A\)	\(SA\)	`[(9,B), (19,C), (29,D)]`	3
3	\(B\)	\(AB\)	`[(8,C), (18,D)]`	2
4	\(C\)	\(BC\)	`[(7,D)]`	1
5	\(D\)	\(CD\)	`[]`	0

The total is

\[4 + 3 + 2 + 1 = 10 = \frac{5 \cdot 4}{2},\]

which matches the general formula.

Answer: The worst-case number of successful DECREASE-KEY calls is \(\frac{n(n-1)}{2}\); the 5-vertex complete graph above realizes it, and its Prim trace performs exactly 10 such updates.

4.2. Augment DSU to Answer Pool Average Queries (Problem Set 12, Task 2)

A CDN operator maintains caches \(1,2,\dots,n\). Each cache \(i\) has a positive stored-object count \(w_i\). Initially each cache is its own pool. We must support:

merging the pools containing \(a\) and \(b\);
checking whether \(a\) and \(b\) are in the same pool;
reporting the average stored-object count in the pool containing \(a\).

Design a DSU variant that supports these operations efficiently.

Click to see the solution

Key Concept: Store aggregate data only at representatives, and update that metadata in constant extra time whenever two sets are merged.

The right idea is to use the standard forest-based DSU with union by rank and path compression, and to store aggregate information only at representatives.

Data stored.

For each root \(r\), maintain:

sum[r]: total stored-object count in the pool;
size[r]: number of caches in the pool.

Initially, for every cache \(i\):

\[parent[i] = i,\qquad rank[i] = 0,\qquad sum[i] = w_i,\qquad size[i] = 1.\]

Merge operation.

To merge the pools containing \(a\) and \(b\):

compute \(r_a = FIND\text{-}SET(a)\) and \(r_b = FIND\text{-}SET(b)\);
if \(r_a = r_b\), do nothing;
otherwise perform union by rank;
if the new root is \(r\), update \[sum[r] = sum[r_a] + sum[r_b],\] \[size[r] = size[r_a] + size[r_b].\]

All non-root nodes may keep stale sum and size values, because queries always go through the representative.

Same-pool query.

Return

\[FIND\text{-}SET(a) = FIND\text{-}SET(b).\]

Average query.

Let \(r = FIND\text{-}SET(a)\). Then the answer is

\[\frac{sum[r]}{size[r]}.\]

Pseudocode.

MERGE(a, b):
  ra = FIND-SET(a)
  rb = FIND-SET(b)
  if ra == rb:
    return
  r = UNION(ra, rb)          // returns the new representative
  if r == ra:
    sum[ra]  = sum[ra]  + sum[rb]
    size[ra] = size[ra] + size[rb]
  else:
    sum[rb]  = sum[ra]  + sum[rb]
    size[rb] = size[ra] + size[rb]

SAME-POOL(a, b):
  return FIND-SET(a) == FIND-SET(b)

AVERAGE(a):
  r = FIND-SET(a)
  return sum[r] / size[r]

Complexity.

With union by rank and path compression:

MERGE performs two finds, one union, and \(O(1)\) extra metadata updates;
SAME-POOL performs two finds;
AVERAGE performs one find and one division.

So each operation costs

\[O(\alpha(n)) \text{ amortized},\]

and a sequence of \(q\) operations costs

\[O(q\,\alpha(n)).\]

Answer: Use a rank-and-compression DSU augmented with sum[root] and size[root]; then MERGE, SAME-POOL, and AVERAGE all run in \(O(\alpha(n))\) amortized time.

4.3. Run Kruskal’s Algorithm and Inspect the Final DSU State (Problem Set 12, Task 3)

Consider the following weighted undirected graph on vertices \(A\) through \(L\):

Assume Kruskal uses a forest-based DSU with union by rank and path compression. Break weight ties lexicographically by the ordered pair of endpoint labels.

Click to see the solution

Key Concept: Kruskal sorts edges by weight and accepts exactly those edges whose endpoints are in different DSU components; skipped edges are precisely the ones that would create cycles.

Step 1: Sort edges by weight.

The beginning of the sorted list is:

\[AB(1),\ CD(2),\ AC(3),\ GH(3),\ KL(3),\ BC(4),\ EF(4),\ EG(5),\ IJ(5),\ CG(7),\ IK(7),\dots\]

Step 2: Scan edges in order and accept only those joining different components.

Edge	Weight	Action	Reason
\(AB\)	1	add	different components
\(CD\)	2	add	different components
\(AC\)	3	add	joins \(\{A,B\}\) with \(\{C,D\}\)
\(GH\)	3	add	different components
\(KL\)	3	add	different components
\(BC\)	4	skip	\(B\) and \(C\) already connected through \(A\) and \(C\)
\(EF\)	4	add	different components
\(EG\)	5	add	joins \(\{E,F\}\) with \(\{G,H\}\)
\(IJ\)	5	add	different components
\(CG\)	7	add	joins \(\{A,B,C,D\}\) with \(\{E,F,G,H\}\)
\(IK\)	7	add	joins \(\{I,J\}\) with \(\{K,L\}\)
\(AE\)	8	skip	endpoints already in the same component
\(BF\)	8	skip	endpoints already in the same component
\(DG\)	8	skip	endpoints already in the same component
\(JK\)	8	skip	endpoints already in the same component
\(AF\)	9	skip	endpoints already in the same component
\(AG\)	9	skip	endpoints already in the same component
\(DH\)	9	skip	endpoints already in the same component
\(GK\)	9	add	joins the left 8-vertex component with \(\{I,J,K,L\}\)

After \(GK(9)\), all 12 vertices are connected, so Kruskal stops. The minimum spanning tree (equivalently, MSF here because the graph is connected) consists of:

\[AB,\ CD,\ AC,\ GH,\ KL,\ EF,\ EG,\ IJ,\ CG,\ IK,\ GK.\]

Its total weight is

\[1 + 2 + 3 + 3 + 3 + 4 + 5 + 5 + 7 + 7 + 9 = 49.\]

Final DSU state.

Using the stated tie-breaking rule and choosing the root of the first endpoint’s set when ranks are equal, one valid final state after a full round of path compression is:

Vertex	Parent	Rank
A	A	3
B	A	0
C	A	1
D	A	0
E	A	2
F	A	0
G	A	1
H	A	0
I	A	2
J	A	0
K	A	1
L	A	0

All parents become \(A\) because path compression flattens the forest when final finds are performed. The rank values are not recomputed during path compression, so they remain historical upper bounds from the union process.

Are these answers unique?

Not completely. The set of accepted edges is forced here by the tie-breaking rule, but in general Kruskal traces can vary when equal-weight edges are processed in a different order. Likewise, the exact DSU parent and rank arrays are not unique because equal-rank unions may choose different roots. What remains invariant is the existence of a minimum spanning tree with the same total weight.

Answer: The accepted edges are \(AB, CD, AC, GH, KL, EF, EG, IJ, CG, IK, GK\) in that order, the total MSF weight is \(49\), and one valid final compressed DSU state has every vertex pointing to root \(A\) with ranks as listed above.

4.4. Trace Prim’s Algorithm on a Weighted Graph (Lecture 11, Example 1)

Apply both the naive edge-based heuristic and Prim’s algorithm to the graph from Section 1.2.2, starting from vertex \(A\).

Click to see the solution

Key Concept: The naive method keeps frontier edges in the queue, while Prim keeps one queue entry per frontier vertex; both find the same MST, but Prim avoids obsolete edge entries.

We use the 6-vertex graph from Section 1.2.2, whose MST weight is 28.

Naive edge-based heuristic.

The priority queue stores frontier edges.

Step	Action	Queue after the step
Start	Tree contains only \(A\)	\(\{A\text{–}D(4),\ A\text{–}B(6)\}\)
1	Extract \(A\)–\(D(4)\), add \(D\)	\(\{A\text{–}B(6),\ D\text{–}B(7),\ D\text{–}C(8),\ D\text{–}E(12)\}\)
2	Extract \(A\)–\(B(6)\), add \(B\)	\(\{D\text{–}B(7),\ B\text{–}E(7),\ D\text{–}C(8),\ B\text{–}C(10),\ D\text{–}E(12)\}\)
3	Extract \(D\)–\(B(7)\), discard (both ends already in tree); then extract \(B\)–\(E(7)\), add \(E\)	\(\{E\text{–}C(5),\ D\text{–}C(8),\ B\text{–}C(10),\ D\text{–}E(12)\}\)
4	Extract \(E\)–\(C(5)\), add \(C\)	\(\{C\text{–}F(6),\ D\text{–}C(8),\ B\text{–}C(10),\ D\text{–}E(12)\}\)
5	Extract \(C\)–\(F(6)\), add \(F\)	done

The resulting tree edges are

\[A\text{–}D,\ A\text{–}B,\ B\text{–}E,\ E\text{–}C,\ C\text{–}F,\]

with total weight \(28\).

Prim’s algorithm.

Now the priority queue stores vertices keyed by their cheapest known connection to the tree.

Step	Extracted vertex	Edge added	Key updates
1	\(A\)	source	\(B \leftarrow 6\), \(D \leftarrow 4\)
2	\(D\)	\(A\)–\(D(4)\)	\(C \leftarrow 8\), \(E \leftarrow 12\)
3	\(B\)	\(A\)–\(B(6)\)	\(E \leftarrow 7\)
4	\(E\)	\(B\)–\(E(7)\)	\(C \leftarrow 5\)
5	\(C\)	\(E\)–\(C(5)\)	\(F \leftarrow 6\)
6	\(F\)	\(C\)–\(F(6)\)	none

So Prim produces the same MST:

\[A\text{–}D,\ A\text{–}B,\ B\text{–}E,\ E\text{–}C,\ C\text{–}F,\]

again with total weight \(\mathbf{28}\).

The key implementation improvement is that every vertex appears only once in the queue. Old candidate edges do not accumulate, so the algorithm is cleaner and easier to analyze.

Answer: Both traces produce the MST \(AD, AB, BE, EC, CF\) of total weight \(28\), but Prim’s vertex-based queue gives the cleaner and standard implementation.

4.5. Apply the Naive MST Heuristic Starting from Vertex A (Lecture 11, Example 2)

Apply the naive “smallest edge from the current tree to the outside” heuristic to the graph from Section 1.2.2, starting from \(A\). Show the queue evolution and identify the data structure needed for an efficient implementation.

Click to see the solution

Key Concept: The heuristic repeatedly extracts the cheapest edge leaving the current tree, so the only required data structure is a min-priority queue over frontier edges.

This is the same trace as in Example 4.4, but written from the data-structure point of view.

Initial queue:

\[Q = \{A\text{–}D(4),\ A\text{–}B(6)\}.\]

Step 1. Extract \(A\)–\(D(4)\), add \(D\), and insert edges from \(D\) to non-tree vertices:

\[Q = \{A\text{–}B(6),\ D\text{–}B(7),\ D\text{–}C(8),\ D\text{–}E(12)\}.\]

Step 2. Extract \(A\)–\(B(6)\), add \(B\), and insert edges from \(B\) to non-tree vertices:

\[Q = \{D\text{–}B(7),\ B\text{–}E(7),\ D\text{–}C(8),\ B\text{–}C(10),\ D\text{–}E(12)\}.\]

Step 3. Extract \(D\)–\(B(7)\) and discard it, because both endpoints are already in the tree. Then extract \(B\)–\(E(7)\), add \(E\), and insert \(E\)–\(C(5)\):

\[Q = \{E\text{–}C(5),\ D\text{–}C(8),\ B\text{–}C(10),\ D\text{–}E(12)\}.\]

Step 4. Extract \(E\)–\(C(5)\), add \(C\), and insert \(C\)–\(F(6)\):

\[Q = \{C\text{–}F(6),\ D\text{–}C(8),\ B\text{–}C(10),\ D\text{–}E(12)\}.\]

Step 5. Extract \(C\)–\(F(6)\) and add \(F\). All vertices are now in the tree.

So the algorithm returns the MST

\[\{A\text{–}D,\ A\text{–}B,\ B\text{–}E,\ E\text{–}C,\ C\text{–}F\},\]

with total weight \(28\).

The required data structure is a min-priority queue of edges, typically implemented as a binary min-heap. It supports repeated extraction of the minimum-weight candidate edge in \(O(\log |E|)\) time.

Answer: Starting from \(A\), the heuristic selects \(AD, AB, BE, EC, CF\) and returns an MST of weight \(28\); an efficient implementation uses a min-priority queue of edges.

4.6. Apply Prim’s Algorithm Starting from Vertex L (Lecture 11, Example 3)

Consider the undirected weighted graph on vertices \(A\) through \(R\) with edges

\[AP(5),\ AJ(4),\ AR(2),\ AE(1),\ AK(6),\ AG(2),\] \[GM(6),\ GK(1),\ KM(2),\ KB(5),\ MB(2),\ BH(1),\ BQ(2),\] \[QK(4),\ QE(5),\ QF(4),\ FN(3),\ NB(6),\ NH(5),\ NC(1),\] \[CF(3),\ CE(5),\ CL(6),\ CI(3),\ IL(4),\ IF(3),\ IO(5),\] \[OD(2),\ DI(1),\ DL(5),\ DR(3),\ DJ(4),\ JR(3).\]

Run Prim’s algorithm from source vertex \(L\), breaking ties lexicographically by vertex label.

Click to see the solution

Key Concept: Prim’s algorithm repeatedly finalizes the smallest current key, and every key decrease records a cheaper connection from the current tree to an outside vertex.

We maintain a min-priority queue of vertices keyed by their current best connection to the growing tree.

Step	Extracted vertex	Edge added	Key	Newly improved vertices
1	\(L\)	source	0	\(C \leftarrow 6\), \(D \leftarrow 5\), \(I \leftarrow 4\)
2	\(I\)	\(L\)–\(I\)	4	\(C \leftarrow 3\), \(D \leftarrow 1\), \(F \leftarrow 3\), \(O \leftarrow 5\)
3	\(D\)	\(I\)–\(D\)	1	\(J \leftarrow 4\), \(O \leftarrow 2\), \(R \leftarrow 3\)
4	\(O\)	\(D\)–\(O\)	2	none
5	\(C\)	\(I\)–\(C\)	3	\(E \leftarrow 5\), \(N \leftarrow 1\)
6	\(N\)	\(C\)–\(N\)	1	\(B \leftarrow 6\), \(H \leftarrow 5\)
7	\(F\)	\(I\)–\(F\)	3	\(Q \leftarrow 4\)
8	\(R\)	\(D\)–\(R\)	3	\(A \leftarrow 2\), \(J \leftarrow 3\)
9	\(A\)	\(R\)–\(A\)	2	\(E \leftarrow 1\), \(G \leftarrow 2\), \(K \leftarrow 6\), \(P \leftarrow 5\)
10	\(E\)	\(A\)–\(E\)	1	none
11	\(G\)	\(A\)–\(G\)	2	\(K \leftarrow 1\), \(M \leftarrow 6\)
12	\(K\)	\(G\)–\(K\)	1	\(B \leftarrow 5\), \(M \leftarrow 2\)
13	\(M\)	\(K\)–\(M\)	2	\(B \leftarrow 2\)
14	\(B\)	\(M\)–\(B\)	2	\(H \leftarrow 1\), \(Q \leftarrow 2\)
15	\(H\)	\(B\)–\(H\)	1	none
16	\(Q\)	\(B\)–\(Q\)	2	none
17	\(J\)	\(R\)–\(J\)	3	none
18	\(P\)	\(A\)–\(P\)	5	none

Therefore the Prim extraction order is

\[L,\ I,\ D,\ O,\ C,\ N,\ F,\ R,\ A,\ E,\ G,\ K,\ M,\ B,\ H,\ Q,\ J,\ P.\]

The MST edges chosen are

\[LI,\ ID,\ DO,\ IC,\ CN,\ IF,\ DR,\ RA,\ AE,\ AG,\ GK,\ KM,\ MB,\ BH,\ BQ,\ RJ,\ AP.\]

Their total weight is

\[4+1+2+3+1+3+3+2+1+2+1+2+2+1+2+3+5 = 38.\]

Answer: The extraction order is \(L, I, D, O, C, N, F, R, A, E, G, K, M, B, H, Q, J, P\), and the resulting MST has total weight \(38\).

4.7. Prove Height Bound with Union by Rank (Lecture 11, Example 4)

Prove that if only union by rank is used, and path compression is not used, then every DSU tree on at most \(n\) nodes has height at most \(\lfloor \log_2 n \rfloor\).

Click to see the solution

Key Concept: Under union by rank, rank can grow only when two equally ranked trees are merged, which forces the subtree size to at least double.

The key statement is:

Claim. A root of rank \(k\) has at least \(2^k\) nodes in its tree.

We prove this by induction on \(k\).

Base case \(k=0\). A rank-0 tree is a singleton created by MAKE-SET, so it has exactly one node, and

\[1 = 2^0.\]

Inductive step. Assume every rank-\(k\) root has at least \(2^k\) nodes. Consider how a root can obtain rank \(k+1\). This happens only when two roots of equal rank \(k\) are linked. By the inductive hypothesis, each of those trees contains at least \(2^k\) nodes, so after the union the new tree contains at least

\[2^k + 2^k = 2^{k+1}\]

nodes.

Thus the claim holds for all \(k\).

Now let a tree have root rank \(r\). Since the whole structure contains at most \(n\) nodes,

\[2^r \le n,\]

\[r \le \log_2 n.\]

Because \(r\) is an integer,

\[r \le \lfloor \log_2 n \rfloor.\]

Without path compression, the rank is an upper bound on the tree height, so

\[\text{height} \le \lfloor \log_2 n \rfloor.\]

Consequently, both FIND-SET and UNION run in \(O(\log n)\) worst-case time when only union by rank is used.

Answer: Any DSU tree built using only union by rank has height at most \(\lfloor \log_2 n \rfloor\), so both FIND-SET and UNION are \(O(\log n)\) in the worst case.

4.8. Find Connected Components Using DSU (Lecture 11, Example 5)

Design an algorithm using DSU to compute the number of connected components of an undirected graph \(G = (V, E)\). Also explain how to output the size of each component.

Click to see the solution

Key Concept: DSU merges the endpoints of every edge, so after all unions, each representative corresponds exactly to one connected component.

The algorithm mirrors the definition of connected components.

CONNECTED-COMPONENTS(G):
  for each vertex v in G.V:
    MAKE-SET(v)

  for each edge (u, v) in G.E:
    if FIND-SET(u) != FIND-SET(v):
      UNION(u, v)

  count = 0
  for each vertex v in G.V:
    if FIND-SET(v) == v:
      count = count + 1

  return count

Why it works.

Whenever an edge \((u, v)\) is processed, the DSU merges the sets containing its endpoints. After all edges have been processed, two vertices lie in the same DSU set if and only if they are connected by a path in \(G\). Therefore each DSU root corresponds to one connected component.

Time complexity.

MAKE-SET is called \(|V|\) times: \(O(|V|)\).
The edge scan performs \(O(|E|)\) finds and unions: \(O(|E|\,\alpha(|V|))\) amortized.
The final root count performs \(|V|\) finds: \(O(|V|\,\alpha(|V|))\).

So the total complexity is

\[O((|V| + |E|)\,\alpha(|V|)).\]

How to output component sizes.

Store an additional field size[root], initialized to 1 for each singleton set. Whenever two components are united, update the size of the new root by adding the two old sizes. At the end, each root stores the size of its connected component.

Answer: Process all edges with DSU unions, then count distinct roots; this yields \(O((|V|+|E|)\alpha(|V|))\) time, and storing size[root] additionally gives each component size.

4.9. Apply Kruskal’s Algorithm to the 18-Vertex Graph (Lecture 11, Example 6)

Apply Kruskal’s algorithm to the 18-vertex graph from Example 4.6. Break ties lexicographically by endpoint labels.

Click to see the solution

Key Concept: Kruskal’s algorithm is a cut-property-driven greedy scan: the next accepted edge is always the lightest edge that connects two different current components.

We process the edges in non-decreasing order of weight and add an edge exactly when it connects two different DSU components.

Edge	Weight	Action	Reason
\(AE\)	1	add	different components
\(BH\)	1	add	different components
\(CN\)	1	add	different components
\(DI\)	1	add	different components
\(GK\)	1	add	different components
\(AG\)	2	add	joins \(\{A,E\}\) with \(\{G,K\}\)
\(AR\)	2	add	adds \(R\) to the \(A\)-component
\(BM\)	2	add	adds \(M\) to the \(B\)-component
\(BQ\)	2	add	adds \(Q\) to the \(B\)-component
\(DO\)	2	add	adds \(O\) to the \(D\)-component
\(KM\)	2	add	joins the \(A\)-side with the \(B\)-side
\(CF\)	3	add	adds \(F\) to the \(C\)-component
\(CI\)	3	add	joins the \(C\)-side with the \(D\)-side
\(DR\)	3	add	joins the left and right large components
\(FI\)	3	skip	endpoints already connected
\(FN\)	3	skip	endpoints already connected
\(JR\)	3	add	adds \(J\)
\(AJ\)	4	skip	endpoints already connected
\(DJ\)	4	skip	endpoints already connected
\(FQ\)	4	skip	endpoints already connected
\(IL\)	4	add	adds \(L\)
\(KQ\)	4	skip	endpoints already connected
\(AP\)	5	add	adds \(P\) and completes the tree

At this point all 18 vertices belong to one component, so the MST is complete. The accepted edges are

\[AE,\ BH,\ CN,\ DI,\ GK,\ AG,\ AR,\ BM,\ BQ,\ DO,\ KM,\ CF,\ CI,\ DR,\ JR,\ IL,\ AP.\]

Their total weight is

\[1+1+1+1+1+2+2+2+2+2+2+3+3+3+3+4+5 = 38.\]

This illustrates the main logic of Kruskal’s algorithm: equal-weight edges may appear in different orders, but every accepted edge must connect two currently different components, and every skipped edge would form a cycle.

Answer: With lexicographic tie-breaking, Kruskal accepts \(AE, BH, CN, DI, GK, AG, AR, BM, BQ, DO, KM, CF, CI, DR, JR, IL, AP\) and obtains an MST of weight \(38\).

4.10. Compute Ackermann Function Values (Lecture 11, Example 7)

Compute \(A_k(1)\) for \(k = 0,1,2,3,4\).

Click to see the solution

Key Concept: Each Ackermann level is defined by iterating the previous level many times, so even the first few values grow extremely quickly.

We use the definition

\[A_0(j) = j + 1,\qquad A_k(j) = A_{k-1}^{(j+1)}(j)\ \text{for } k>0.\]

For \(k=0\):

\[A_0(1) = 2.\]

For \(k=1\):

\[A_1(1) = A_0^{(2)}(1) = A_0(A_0(1)) = A_0(2) = 3.\]

For \(k=2\):

\[A_2(1) = A_1^{(2)}(1) = A_1(A_1(1)) = A_1(3).\]

Since \(A_1(j) = 2j+1\),

\[A_2(1) = 2 \cdot 3 + 1 = 7.\]

For \(k=3\):

\[A_3(1) = A_2^{(2)}(1) = A_2(A_2(1)) = A_2(7).\]

Now \(A_2(7)=A_1^{(8)}(7)\), so starting from 7 and repeatedly applying \(A_1(j)=2j+1\) gives

\[7 \to 15 \to 31 \to 63 \to 127 \to 255 \to 511 \to 1023 \to 2047.\]

Hence

\[A_3(1)=2047.\]

For \(k=4\):

\[A_4(1)=A_3^{(2)}(1)=A_3(A_3(1))=A_3(2047).\]

This means applying \(A_2\) exactly 2048 times starting from 2047. The value is astronomically larger than any ordinary exponential; in particular it dominates a tower of 2047 twos.

So the key values are:

\(k\)	\(A_k(1)\)
0	2
1	3
2	7
3	2047
4	unimaginably large

Therefore:

\[\alpha(n)=0 \text{ for } 0 \le n \le 2,\] \[\alpha(n)=1 \text{ for } n=3,\] \[\alpha(n)=2 \text{ for } 4 \le n \le 7,\] \[\alpha(n)=3 \text{ for } 8 \le n \le 2047,\] \[\alpha(n)=4 \text{ for } 2048 \le n \le A_4(1).\]

This is why the inverse Ackermann function is treated as essentially constant in real implementations.

Answer: The values are \(A_0(1)=2\), \(A_1(1)=3\), \(A_2(1)=7\), \(A_3(1)=2047\), and \(A_4(1)\) is astronomically large; consequently, \(\alpha(n)\le 4\) for every practical input size.

1. Summary

1.1 Greedy Algorithms

1.1.1 Optimization Problems

1.1.2 The Greedy Pattern

1.2 Minimum Spanning Trees

1.2.1 Spanning Trees and Their Weight

1.2.2 Example: Finding an MST

1.2.3 The Cut Property

1.3 Prim’s Algorithm

1.3.1 A Naive Growing Strategy

1.3.2 Prim’s Algorithm

1.3.3 Pseudocode

1.3.4 Why Prim’s Algorithm is Correct

1.3.5 Complexity Analysis

1.4 Disjoint-Set Data Structures

1.4.1 Motivation and Operations

1.4.2 Linked-List Representation

1.4.3 Weighted Union (Union by Size)

1.4.4 Forest Representation

1.4.5 Union by Rank

1.4.6 Path Compression

1.4.7 Combined Complexity and Practical Augmentation

1.4.8 Comparison of Main DSU Variants

1.5 Kruskal’s Algorithm

1.5.1 Idea

1.5.2 Why Kruskal’s Algorithm is Correct

1.5.3 Running Time

1.6 The Ackermann Function and Its Inverse

1.6.1 Ackermann Function

1.6.2 Inverse Ackermann Function

2. Definitions

3. Formulas

4. Examples

4.1. Maximize DECREASE-KEY Calls in Prim’s Algorithm (Problem Set 12, Task 1)

4.2. Augment DSU to Answer Pool Average Queries (Problem Set 12, Task 2)

4.3. Run Kruskal’s Algorithm and Inspect the Final DSU State (Problem Set 12, Task 3)

4.4. Trace Prim’s Algorithm on a Weighted Graph (Lecture 11, Example 1)

4.5. Apply the Naive MST Heuristic Starting from Vertex A (Lecture 11, Example 2)

4.6. Apply Prim’s Algorithm Starting from Vertex L (Lecture 11, Example 3)

4.7. Prove Height Bound with Union by Rank (Lecture 11, Example 4)

4.8. Find Connected Components Using DSU (Lecture 11, Example 5)

4.9. Apply Kruskal’s Algorithm to the 18-Vertex Graph (Lecture 11, Example 6)

4.10. Compute Ackermann Function Values (Lecture 11, Example 7)

4.1. Maximize `DECREASE-KEY` Calls in Prim’s Algorithm (Problem Set 12, Task 1)